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Abstract 

Given  a  finite  collection  of  classifiers  one  might  wish  to  combine,  or  fuse,  the  classifiers  in 
hopes  that  the  multiple  classifier  system  (MCS)  will  perform  better  than  the  individuals.  One 
method  of  fusing  classifiers  is  to  combine  their  final  decision  using  Boolean  rules  (e.g.,  a  logical  OR, 
AND,  or  a  majority  vote  of  the  classifiers  in  the  system).  An  established  method  for  evaluating 
a  classifier  is  measuring  some  aspect  of  its  Receiver  Operating  Characteristic  (ROC)  curve,  which 
graphs  the  trade-off  between  the  conditional  probabilities  of  detection  and  false  alarm.  This  work 
presents  a  unique  method  of  estimating  the  performance  of  an  MCS  in  which  Boolean  rules  are 
used  to  combine  individual  decisions.  The  method  requires  performance  data  similar  to  the  data 
available  in  the  ROC  curves  for  each  of  the  individual  classifiers,  and  the  method  can  be  used  to 
estimate  the  ROC  curve  for  the  entire  system.  A  consequence  of  this  result  is  that  one  can  save  time 
and  money  by  effectively  evaluating  the  performance  of  an  MCS  without  performing  experiments. 
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EVALUATING  THE  PERFORMANCE  OF  MULTIPLE  CLASSIFIER  SYSTEMS: 
A  MATRIX  ALGEBRA  REPRESENTATION  OF  BOOLEAN  FUSION  RULES 

I.  Introduction 

1.1  General  Discussion  and  Background 

The  ability  to  accurately  detect  and  identify  targets  is  an  important  issue  for  the  U.  S.  Air 
Force  and  the  Department  of  Defense.  The  military  departments  traditionally  wish  to  determine 
whether  a  particular  object  is  hostile  or  friendly  (target  or  clutter,  foe  or  friend,  etc.).  Similar 
problems  exist  in  many  fields.  For  example,  members  of  the  medical  community  may  wish  to 
distinguish  between  cancerous  and  benign  cells,  or  a  mortgage  company  might  attempt  to  discern 
a  fit  borrower  from  one  who  is  likely  to  default  on  a  loan. 

All  these  problems  are  a  part  of  a  broad  field  called  classification,  a  field  which  includes 
several  approaches  for  solving  problems  like  these.  Methods  could  include  something  as  simple  as  a 
visual  inspection  of  the  object  of  interest  or  a  more  scientific  approach,  like  the  linear  discriminant 
function  developed  by  Fisher  [7].  Another  common  approach  to  modern  classification  problems  is 
the  use  of  artificial  neural  networks,  which  use  algorithms  that  learn  how  to  classify  data.  In  the 
early  1980s,  scientists  and  engineers  began  examining  the  idea  of  using  groups  of  classifiers,  or 
multiple  classifier  systems  (MCSs) ,  in  hopes  of  increasing  accuracy,  and  research  in  this  area  of  the 
field  continues  today  [23]. 

1.2  Problem  Description 

Researchers  are  often  concerned  with  evaluating  the  performance  of  a  classifier  or  an  MCS. 
The  systems  employed  in  these  applications  are  usually  systems  using  sensors  to  collect  data  and 
other  mechanisms,  called  classifiers,  to  classify  each  observation.  Figure  1  shows  a  notional  single 
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Figure  1:  Notional  Single  Classifier  System. 


classifier  system.  The  data  collected  by  the  sensor,  called  features,  are  typically  converted  by  the 
classifier  to  some  numerical  value.  If  this  value  is  greater  than  a  nominal  threshold  value,  the 
observation  is  placed  into  one  class.  Otherwise  the  observation  is  placed  into  the  other  class. 

A  multiple  classifier  system  is  one  in  which  multiple  sensor/classifier  ensembles  collect  data 
and  reach  conclusions  independently.  Those  decisions  are  then  combined  in  some  manner  to  reach  a 
decision  for  the  system.  A  system  designer  might  want  to  build  a  system  that  makes  use  of  multiple 
classifiers  for  various  reasons.  Different  classifiers  may  be  trained  to  detect: 

•  Different  types  of  targets. 

•  Different  attributes  of  the  same  target. 

•  The  same  target  under  different  operating  conditions. 

In  a  multiple  classifier  system  there  are  potentially  several  events  observed  and  several  streams 
of  data  collected.  Figure  2  depicts  the  design  of  a  notional  two-classifier  system  in  which  the  final 
classifier  decisions  are  combined.  Combining  the  final  decision  of  each  classifier  in  the  system  is  not 
the  only  method  of  combination  in  an  MCS,  but  discussion  in  Chapter  2  explains  why  researching 
this  type  of  classifier  combination  is  a  worthwhile  pursuit,  especially  for  military  applications. 
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Figure  2:  Notional  Multiple  Classifier  System  with  Two  Classifiers. 


1.3  Research  Objectives 

The  goal  of  this  research  is  to  provide  some  mathematical  insight  into  the  process  of  evaluating 
MCSs.  Specifically,  the  purpose  of  the  research  is  to  identify  properties  that  may  aid  the  analyst  or 
engineer  in  evaluating  an  MCS  by  using  the  data  available  in  the  Receiver  Operating  Characteristic 
(ROC)  curves  from  the  individual  classifiers  in  the  ensemble.  The  primary  contribution  of  the 
thesis  is  a  formula  for  computing  ROC  curve  values  for  an  MCS  using  (a)  the  ROC  curves  from 
the  individual  classifiers  in  the  system,  and  (b)  any  Boolean  rule  for  combining  the  decisions  from 
the  individuals. 
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II.  Literature  Review 


2. 1  Introduction/  Overview 

This  chapter  reviews  the  literature  regarding  two-class  classification  or  detection.  Receiver 
operating  characteristic  (ROC)  curves  are  described  as  a  visual  method  for  evaluating  classifier 
performance,  and  methods  for  comparing  classifiers  using  ROC  curves  are  discussed.  Then  the 
concepts  of  combining  and  fusing  classifiers  are  introduced,  followed  by  a  discussion  of  various 
systems  of  multiple  classifiers  and  the  different  methods  used  to  bring  together  results  from  the 
individual  classifiers  to  optimize  the  performance  of  the  system.  This  discussion  will  include  an 
introduction  to  the  concept  of  constant  false  alarm  rate  (CFAR)  fusion,  where  the  system  is  designed 
to  yield  the  best  possible  detection  rate  while  maintaining  an  acceptable  number  of  false  alarms. 
There  will  also  be  considerable  discussion  of  simple  logical,  or  Boolean,  rules  for  combining  classifier 
outputs. 

2.2  Receiver  Operating  Characteristic  (ROC)  Curves 

2.2.1  ROC  Curve  Background.  Receiver  operating  characteristic 
way  of  describing  the  performance  of  a  classifier.  Once  the  classifier  makes 
hostile),  there  are  four  possible  results,  or  output  states.  The  classifier  can: 

1.  Correctly  identify  a  hostile  target. 

2.  Misclassify  a  hostile  target  as  friendly. 

3.  Misclassify  a  friendly  target  as  hostile. 

4.  Correctly  identify  a  friendly  object  or  clutter. 

Using  Egan’s  terminology  [6],  the  probability  of  scenario  one  (conditional  upon  the  existence 
of  a  hostile  target)  corresponds  with  the  hit  rate  or  probability  of  detection  and  will  be  denoted 
P/j.  Scenario  two  corresponds  with  the  miss  rate  and  will  be  represented  by  Pm-  Scenario  three 


(ROC)  curves  are  one 
a  decision  (friendly  or 
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Table  1:  Conditional  Classification  Probabilities. 


Scenario 

Notation 

Meaning 

1.  Hit  Rate 

Pd 

Pr  {Hostile  Classification  Hostile  Target  Present} 

2.  Miss  Rate 

Pm 

Pr  {Friendly  Classification  |  Hostile  Target  Present} 

3.  False  Alarm  Rate 

Pfa 

Pr  {Hostile  Classification  No  Hostile  Targets  Present} 

4.  Correct  Rejection  Rate 

Pc 

Pr  {Friendly  Classification  |  No  Hostile  Targets  Present} 

Pfa 


Figure  3:  Typical  Receiver  Operating  Characteristics  (ROC)  Curve. 

corresponds  with  the  false  alarm  rate  and  will  be  denoted  Pfa-  Finally,  the  term  Pc  is  used  to 
signify  scenario  four,  a  correct  rejection.  Table  1  lists  the  conditional  probabilities  for  a  classifier. 
Note  that  /’/;  +  P\j  =  1  and  Pfa  +  Pc  =  1- 

The  ROC  curve  is  a  graph  of  the  trade-off  between  the  target  detection  rate  (Pd)  and  the  false 
alarm  rate  (P fa)  for  a  particular  classifier.  A  notional  ROC  curve  is  shown  in  Figure  3.  Typically, 
the  ROC  curve  is  constructed  by  varying  the  decision  threshold  ( 6 )  for  the  classifier  and  plotting 
the  observed  values  for  Pd  and  Pfa',  therefore,  a  ROC  curve  for  a  typical  classifier  is  actually  a 
two-dimensional  projection  of  an  object  in  3-space  (9,  Pd, Pfa)  [2]. 
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Egan  notes  that  the  slope  at  any  point  on  a  ROC  curve  is  equal  to  the  likelihood  ratio  for  a 


particular  observation  x.  The  likelihood  ratio  is  computed  with  the  function 


L{x) 


Pr{a;  |  H} 
Pr{x  |  F}  ’ 


and  is  a  measure  of  the  strength  of  the  evidence  carried  by  the  observation.  He  points  out  that 
a  decision  rule  can  be  based  on  this  measure  (e.g.,  if  L(x)  >  0L^,  conclude  hostile).  That  is,  if 
the  evidence  is  stronger  than  some  predetermined  value  dL^ ,  the  classifier  will  determine  that  the 
observation  is  hostile.  Thus,  higher  values  of  0L ^  correspond  with  more  conservative  decisions, 
because  the  classifier  requires  more  evidence.  Decreasing  0  uxj  yields  a  more  aggressive  fusion  rule. 

The  alternative  is  for  the  classifier  to  base  its  decision  on  the  actual  observation  x.  If  L(x)  is 
monotonic  with  respect  to  x  these  decision  rules  are  similar  (e.g.,  if  L(x)  >  Ol(x)  or  x  >  0X,  conclude 
that  the  object  is  hostile).  However,  if  L(x)  is  not  monotonic  with  respect  to  x  one  can  still  define 
a  decision  rule  based  on  x  that  is  equivalent  to  the  rule  based  on  L(x).  In  such  cases  the  classifier 
would  conclude  that  the  observation  is  hostile  if  the  value  x  was  included  in  specified  subintervals. 
(Egan  provides  an  example  of  such  a  case  which,  for  brevity,  is  omitted  here.)  Constructing  a  ROC 
curve  by  varying  9X  across  x  in  a  case  like  this  may  result  in  an  ill-behaved  ROC  curve. 

If  the  decision  rule  is  based  on  L(x)  rather  than  x,  the  slope  of  the  ROC  curve  must  be 
nonincreasing  with  respect  to  P  |  since  the  decisions  are  always  more  aggressive  as  0uxj  decreases. 
Egan  refers  to  a  ROC  curve  constructed  in  this  manner  as  a  proper  ROC  curve,  and  he  notes  that 
the  slope  of  a  proper  ROC  curve  is  nonincreasing  with  respect  to  Pfa  ■  That  is,  a  proper  ROC  curve 
is  always  concave  down.  Logically,  if  the  evidence  that  the  object  is  hostile  is  strong,  Pd  »Pfa, 
and  as  the  strength  of  the  evidence  decreases  Pjj  will  approach  P  /,  | . 


2.2.2  Comparing  ROC  Curves. 
The  most  obvious  comparison  is  visual. 


There  are  several  methods  for  comparing  ROC  curves. 
If  one  curve  always  has  a  higher  P/j  than  another  for  a 
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given  Pfa,  then  the  classifier  corresponding  to  the  higher  curve  is  superior.  These  cases  are  not 
very  interesting,  however,  and  other  methods  of  comparison  exist  for  evaluation  in  cases  where  one 
curve  is  not  always  superior.  One  commonly  used  method  is  to  compare  the  area  under  the  curve 
(AUC)  [6],  [12],  [3].  Classifiers  that  correspond  with  curves  having  greater  AUCs  are  considered 
better.  The  AUC  can  be  computed  using  a  trapezoidal  approximation  of  the  area.  Another  method 
is  to  compute  the  average  metric  distance  between  the  ROC  curve  and  the  chance  line  [1].  This 
distance  can  be  computed  by 

n 

E  \\(pD  (Oi),PFA  (Oi))  -  (0i,0i)Hi 

MD  =  — - , 

n 

where  (Pd  ( 9i ) ,  Pfa  (Si))  denotes  the  ith  point  sampled  from  the  ROC  curve  and  []*•][£  is  the  1-norm 
on  K2.  The  classifier  with  the  larger  MD  is  considered  superior.  A  detailed  discussion  of  ROC  curve 
comparison  methods  (including  the  derivation  of  the  average  metric  distance  and  a  multinomial 
selection  algorithm)  is  found  in  Alsing  [1], 

2.3  Fusing  Classifiers 

Saranli  and  Demirekler  note  that  “decision  combination  systems  are  of  considerable  interest 
to  a  large  number  of  pattern  recognition  fields”  [25].  They  also  noted  some  potential  benefits 
that  may  be  gained  by  combining  classifiers.  Bayesian  classifiers  work  by  estimating  the  posterior 
probability  of  a  particular  observation  belonging  to  a  particular  class,  and  in  statistical  estimation 
there  is  inherent  variance  in  the  estimate  because  it  is  based  on  sample  data.  Turner  and  Ghosh 
showed  that  averaging  the  estimates  from  several  different  classifiers  reduces  this  variance  [26]. 
Saranli  and  Demirekler  also  pointed  out  the  rather  obvious  notion  that  some  classifiers  may  work 
better  than  others  for  certain  observations  or  particular  types  of  problems  [25].  By  using  several 
experts  the  system  designer  hopes  to  have  a  better  chance  at  correct  classification. 
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2.3.1  Categories  of  Decision  Combination.  Dasarathy  notes  that  classifier  decision  fusion 
is  a  subset  of  sensor  fusion.  He  defines  sensor  fusion  as  “the  study  of  optimal  information  processing 
in  distributed  multisensor  environments  through  intelligent  integration  of  the  multisensor  data”  [5] , 
and  he  notes  that  there  is  a  three-level  hierarchy  of  fusion,  comprised  of  data,  feature,  and  decision 
fusion.  Data  is  defined  as  raw  information  which  can  be  organized  or  combined  to  create  features 
relating  to  a  particular  observation.  The  features  can  then  be  used  by  a  classifier  to  arrive  at  a 
decision.  Dasarathy  defines  five  separate  categories  of  fusion  problems  based  on  input  and  output 
modes. 

1.  Data  input /Data  output 

2.  Data  input /Feature  output 

3.  Feature  input/Feature  output 

4.  Feature  input /Decision  output 

5.  Decision  input /Decision  output 

The  final  category,  where  individual  decisions  are  used  as  input  to  arrive  at  a  system  decision, 
is  an  especially  important  category  and  the  focus  of  this  research.  Dasarathy  notes  that  this  type 
of  fusion  is  applicable  no  matter  what  types  classifier  systems  are  employed.  By  using  only  the 
final  decision  from  each  classifier,  the  system  is  not  hindered  by  instances  of  incompatibility.  The 
individuals  may  be  designed  using  different  architectures,  methodologies,  or  philosophies;  but  a 
system  combining  the  discrete  decision  values  is  not  affected  by  such  disparities  in  design.  Further, 
Varshney  notes  that  many  problems  have  practical  limitations  on  the  amount  of  data  that  can  be 
transferred  from  the  individual  classifiers  to  the  fusion  center  [28].  In  these  problems,  the  geograph¬ 
ical  proximity  of  the  individual  classifiers  and  the  bandwidth  available  for  electronic  communication 
contribute  to  a  situation  in  which  it  may  be  beneficial  to  transmit  as  little  data  as  possible. 


Table  2:  Conditions  for  Statistical  Independence. 

Conditions  for  Statistical  Independence 

Pr {A  |  B}  =  Pr{A} 

Pr {B  |  A}  =  Pr{£} 

_ Pr{in5}  =  Pr{A}Pr{l?} _ 

The  paper  by  Xu  et  al  is  a  seminal  work  in  the  field  [30] .  They  further  categorized  the  types 
of  decision  combination  into  three  groups  [30].  In  Type  I,  only  the  final  decision  made  by  each 
classifier  is  sent  to  the  fusion  center  (e.g.,  Class  =  B).  In  Type  II  each  classifier  reports  a  ranked 
list  of  possibilities,  and  the  combiner  uses  the  ranked  list  to  make  a  decision  (e.g.,  1st  Choice  =  B, 
2nd  Choice  =  A,  etc.).  A  Type  III  combiner  accepts  a  list  of  possible  decisions  along  with  some 
measure  of  confidence  in  those  decisions  (e.g.,  A  =  0.60,  B  =  0.25,  C  =  0.15).  The  focus  of  their 
work  was  on  Type  I  decision  combination  “due  to  its  generality” . 

2.3.2  Independence.  Analysis  of  multiple  classifier  systems  often  includes  some  discussion 
on  independence.  However,  the  research  often  seems  conflicted  about  what  type  of  independence  is 
important.  Some  of  the  literature  discusses  statistical  independence  between  the  classifiers  [21]  [14], 
while  other  works  are  more  concerned  with  the  classifiers  making  independent  errors  [13]  [24]. 
Others  point  out  that  combining  classifiers  that  make  negatively  correlated  errors  can  enhance 
system  performance  [17]. 

Statistical  independence  between  two  events  A  and  B  is  defined  if  any  of  the  conditions  in 
Table  2  are  true  [29].  The  primary  reason  for  an  assumption  of  statistical  independence  is  that 
it  makes  computing  joint  probabilities  much  simpler.  With  this  assumption,  analysts  can  easily 
examine  MCSs  by  computing  joint  probabilities  without  accounting  for  correlation  between  the 
individual  classifiers.  It  may  also  make  sense  to  ignore  correlation  in  a  notional  analysis  unless 
a  realistic  estimate  of  dependence  is  available.  That  said,  an  in-depth  analysis  of  a  specific  MCS 
should  probably  include  an  analysis  and  discussion  of  classifier  dependence.  That  is,  general  analy¬ 
ses  should  probably  assume  statistical  independence,  and  specific  analyses  should  not. 
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Hansen  and  Salamon  assumed  that  the  individual  classifiers  in  a  theoretical  MCS  made  inde¬ 


pendent  random  errors  [13],  and  many  other  researchers  followed  suit.  Furthermore,  Giacinto  et  al, 
observe  that  “most  combination  methods  described  in  the  literature  assume  that  MCSs  are  made 
up  of  classifiers  making  independent  classification  errors”  [10].  However,  Kuncheva  and  her  col¬ 
leagues  point  out  that  negatively  correlated  errors  can  enhance  MCS  performance  [17],  so  perhaps 
it  is  not  independent  errors,  but  opposing  errors  that  are  a  key  element  in  MCS  construction. 

2-4  Different  Methods  for  Fusing  Classifiers 

This  section  presents  a  brief  overview  of  some  of  the  philosophies  for  fusing  decisions.  These 
methodologies  were  selected  based  on  their  foundations  in  optimization  theory  or  their  applicability 
to  the  remainder  of  the  thesis. 

2-4-1  Constant  False  Alarm  Rate.  When  the  objective  of  the  classifier  system  is  to  detect 
a  target  in  clutter  it  is  conceivable  that  the  clutter  patterns  might  vary  from  target  to  target. 
Furthermore,  if  a  target  is  mobile  the  clutter  patterns  around  that  particular  target  may  not  be 
stationary.  One  philosophy  for  detecting  targets  in  scenarios  like  these  is  called  Constant  False 
Alarm  Rate  (CFAR)  [28].  In  CFAR  fusion,  the  decision  thresholds  used  by  the  classifiers  in  the 
MCS  are  allowed  to  vary  in  such  a  way  that  the  detection  probability  is  maximized  while  the  system 
maintains  a  specified  false  alarm  rate.  Two  popular  approaches  for  CFAR  are  Cell  Averaging  CFAR 
(CA-CFAR)  and  Order  Statistics  CFAR  (OS-CFAR).  A  typical  method  for  CFAR  optimization  is 
the  use  of  the  Lagrangian  expression  given  by 

L(6i,62,  ~.,0k)  =  PD(91,Q2,  ...,0k)  -  \(Pfa(6i,02,  —,6k)-p), 
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where  p  is  the  maximum  allowable  false  alarm  rate  for  the  system  and  the  0,  are  the  decision 
thresholds  for  the  individual  classifiers.  Setting  the  gradient  of  L  equal  to  the  zero  vector  and 
solving  for  A  and  the  0,  yields  candidates  for  the  optimal  threshold  values. 

2-4-2  Boolean  Fusion  Rules.  One  method  for  combining  classifier  decisions  is  to  use 
Boolean,  or  simple,  rules.  One  simple  rule  is  for  the  system  to  conclude  that  a  hostile  target  is 
present  if  and  only  if  all  the  classifiers  in  the  system  conclude  that  their  observations  both  indicate 
hostile  targets.  This  rule  is  heretofore  referred  to  as  the  AND  rule.  Another  simple  rule  is  for  the 
system  to  conclude  that  a  hostile  target  is  present  if  any  of  the  individual  classifiers  indicate  hostile 
targets.  This  rule  is  called  the  OR  rule.  When  the  system  includes  more  than  two  classifiers,  other 
simple  rules  are  possible.  For  example,  in  a  system  where  each  classifier  can  only  output  two 
labels  (/  or  h),  the  number  of  possible  Boolean  rules  is  22A  ,  where  K  is  the  number  of  individual 
classifiers. 

Liggins  gives  a  subset  of  the  possible  Boolean  rules  for  a  three  classifier  system  in  a  table 
similar  to  Table  3  [18].  The  first  three  columns  of  the  table  indicate  the  decisions  of  the  individual 
classifiers  (1  for  hostile,  0  for  friendly).  The  remaining  columns  are  vector  representations  of  the 
fusion  rules.  A  “1”  in  a  particular  row  indicates  that  the  system  will  conclude  hostile  if  the  outputs 
of  the  individual  classifiers  correspond  with  those  in  the  first  three  columns.  For  example,  under 
rule  r2  the  system  will  only  classify  an  object  as  hostile  if  all  three  classifiers  conclude  the  same, 
and  under  r4  the  system  ignores  the  decision  of  classifier  A3  and  classifies  an  object  as  hostile  if 
Aiand  A2  decide  hostile. 

He  notes  that  the  256  theoretical  rules  for  the  3-cletector  case  can  be  reduced  to  18  rules  by 
application  of  the  monotonicity  rule,  which  assumes  that  Pd  must  be  greater  than  Pfa-  The  18 
relevant  rules  represent  “all  possible  physical  contingencies.”  For  example,  a  rule  represented  by 
(0, 1, 0, 0,  0,  0, 0, 0)T  is  not  a  relevant  rule.  Under  such  a  rule  the  system  would  conclude  hostile  only 
if  A3  concluded  hostile  and  the  other  classifiers  concluded  friendly.  This  rule  would  be  illogical, 
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Table  3:  Relevant  Fusion  Rules  for  a  3-Classifier  MCS. 


Classifiers 

Monotonic  Fusion  Rules 

Ai 

A2 

A3 

rl 

r2 

r4 

r6 

r8 

rl6 

rl8 

r20 

r22 

r24 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

1 

1 

1 

0 

0 

1 

1 

1 

1 

0 

0 

0 

1 

0 

1 

1 

0 

1 

0 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Ai 

A2 

A3 

r34 

r52 

r56 

r64 

r86 

r88 

r96 

rl20 

rl28 

r256 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

1 

1 

1 

1 

1 

1 

0 

1 

0 

0 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

1 

1 

1 

0 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

because  it  ignores  other  cases  where  A3  decides  that  the  target  is  hostile.  Thus,  any  relevant  fusion 
rule  represented  with  a  1  in  the  second  row  will  also  have  a  1  in  the  fourth,  sixth  and  eighth  rows. 
Also  note  that  rules  rl  and  r256  are  not  relevant  fusion  rules.  They  were  included  merely  to  bound 
the  set  of  relevant  rules. 

The  rules  in  Table  3  can  be  grouped  into  three  classes:  1-classifier  rules,  2-classifier  rules  and 
3-classifier  rules.  Liggins  notes  that  rules  rl6,  r52  and  r86  make  up  the  subset  of  1-classifier  rules 
(e.g.,  for  rl6  the  system  concludes  hostile  if  classifier  A\  concludes  hostile).  He  also  notes  that  the 
2-classifier  rules  can  be  broken  into  AND  rules  and  OR  rules.  From  this  point  forward,  AND  rules 
may  be  indicated  with  a  A,  and  OR  rules  may  be  represented  with  V.  Liggins  also  categorizes  AND 
and  OR  rules  for  the  3-classifier  case,  as  well  as  the  majority  vote  and  a  case  Liggins  defines  as 
sensor  dominance,  where  the  system  always  accepts  the  decision  of  one  classifier  but  accepts  the 
decisions  of  the  other  two  classifiers  only  if  they  both  agree  that  a  hostile  target  is  present.  He  fails 
to  categorize  the  remaining  three  rules  (r8,  r20  and  r22),  all  of  which  are  similar.  In  these  rules 
one  sensor  must  conclude  hostile  and  at  least  one  of  the  others  must  agree.  Such  a  configuration 
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Table  4:  Categories  of  Relevant  Boolean  Fusion  Rules  for  a  3-Classifier  System 


Category 

Rule 

Meaning 

||  1-Classifier  Rules  j| 

rl6 

Ai 

r52 

M 

r86 

A3 

||  2- Classifier  Rules  |[ 

AND  Rules 

r4 

A1AA2 

r6 

AiaA3 

rl8 

A2AA3 

OR  Rules 

r64 

AivA2 

r96 

AivA3 

rl20 

A2vA3 

||  3- Classifier  Rules  j[ 

AND  Rule 

r2 

-A1A-A2A-A3 

OR  Rule 

rl28 

AivA2vA3 

Majority  Vote 

r24 

{A\  A  A2 )  V  {A\  A  A3)  v  ( A2  A  A3) 

Sensor  Dominance 

r34 

Aiv(A2AA3) 

r56 

A2V(AiAA3) 

r88 

A3v(AiAA2) 

Sensor  Corroboration 

r8 

AiA(A2VA3) 

r20 

A2a(AiVA3) 

r22 

A3a(AiVA2) 

will  henceforth  be  called  Sensor  Corroboration ,  because  at  least  one  sensor  must  corroborate  the 
decision  of  the  primary  sensor.  A  practical  example  of  case  where  sensor  corroboration  would  be 
appropriate  is  a  threat  detection  scenario  in  which  one  classifier  is  trained  to  detect  movement  and 
the  other  two  are  each  trained  to  identify  a  certain  type  of  enemy  vehicle.  If  the  first  classifier 
detects  movement  and  one  of  the  others  confirms  that  the  object  is  an  enemy  vehicle,  then  the 
object  should  be  classified  as  a  viable  threat.  Table  4  categorizes  the  18  relevant  rules. 

A  majority  vote  among  the  available  classifiers  is  a  simple  rule  that  has  received  much  at¬ 
tention  in  the  literature  [13]  [30]  [8]  [15]  [19]  [9]  [16]  [?].  Voting  rules  are  among  the  easiest  to 
conceptualize,  because  everyday  decisions  are  often  made  in  this  manner.  Xu  et  al  noted  that  all 
voting  rules  are  not  necessarily  majority  votes.  One  can  specify  a  more  conservative  rule  (e.g., 
require  a  2/3  majority)  or  a  less  conservative  rule  such  as  a  multi-class  problem  where  the  system 
decision  is  the  class  with  the  largest  number  of  votes,  whether  or  not  there  is  a  majority. 
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Tabic  5:  Ralston’s  Performance  Matrix. 


Output  State 

Truth 

Classifier  k 

Friend 

Hostile 

1 

Pr{l  |  F} 

Pr{l  |  H} 

mk 

Pr {mk  |  F} 

Pr {mk  \  H} 

Most  theoretical  analysis  has  been  on  the  majority  vote.  Hansen  and  Salamon  showed  that 
the  performance  of  the  system  will  be  greater  than  the  performance  of  the  best  individual  classifier 
under  certain  assumptions  [13].  They  noted  that  the  classification  accuracy  of  the  system  increases 
with  the  number  of  classifiers  under  the  assumptions  that  (a)  each  classifier  is  right  at  least  half 
the  time,  (b)  the  classifiers  make  independent  errors.  Matan  gives  both  upper  and  lower  bounds 
on  classification  accuracy  for  the  more  general  case  of  a  “fc-of-n  special  majority” ,  where  n  is  the 
number  of  classifiers  in  the  MCS  and  k  is  any  integer  greater  than  n/2  [19].  Kuncheva  et.  al 
also  derive  upper  and  lower  bounds  for  the  majority  vote,  but  their  work  also  takes  into  account 
the  pairwise  dependence  between  the  classifiers  in  the  MCS  [16].  None  of  the  literature  discusses 
theoretical  limits  on  the  ROC  curve  of  a  majority  voting  system  or  any  other  MCS  for  that  matter. 

2-4-3  Identification  System  Operating  Characteristic  Curves.  Ralston  adapted  the  con¬ 
cepts  of  likelihood  ratios  to  determine  the  best  choice  of  combined  classifier  output  states  [22]. 
Given  K  classifiers,  each  with  m k  output  states,  the  purpose  of  Ralston’s  combat  identification 
system  is  to  determine  whether  an  exemplar  is  a  friend  or  hostile.  Each  classifier  k  €  {1,  2, ...,  K} 
has  a  performance  matrix  (denoted  PM  and  determined  via  system  testing,  etc.)  with  two  columns. 
The  first  column  corresponds  to  the  probability  of  a  particular  output  state  given  that  the  exem¬ 
plar  is  truly  a  friend.  The  second  column  gives  the  probability  of  each  output  state  given  that  the 
exemplar  is  hostile.  Table  5  shows  an  example  PM. 

There  is  a  system  output  state  corresponding  with  each  possible  combination  of  outputs  from 
the  individual  classifiers  in  the  system.  Thus,  if  there  are  nk  output  states  for  each  classifier  k, 
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there  are 


K 

n = n  nk 

k= 1 

output  states  for  the  system.  Assuming  that  the  classifiers  in  the  system  are  not  correlated,  the 
conditional  probability  that  a  friendly  object  will  yield  any  output  state  can  be  computed  as  the 
product  of  the  appropriate  cells  from  the  first  column  of  the  PM.  For  example,  one  output  state 
for  a  notional  system  occurs  when  classifier  A\  is  in  state  q,  A2  is  in  state  r,  and  A3  is  in  state  s. 
The  probability  that  a  friendly  object  puts  the  system  in  this  state  is  the  product  of  the  ( q ,  1)  cell 
of  the  PM  for  A±,  the  (r,  1)  cell  of  the  of  the  PM  for  A2,  and  the  (s,  1)  cell  of  the  PM  for  A3.  The 
probability  that  a  hostile  object  will  yield  a  particular  output  state  can  be  computed  in  a  similar 
manner. 

Also  worth  mentioning  is  Ralston’s  representation  of  possible  output  state  combinations.  He 
suggests  representing  each  rule  or  output  state  combination  as  a  vector  R  of  length  N.  If  the 
rule  indicates  that  a  particular  output  state  j  will  force  the  system  to  conclude  that  an  object  is 
hostile,  the  vector  R  contains  a  1  in  the  jth  element  of  R.  Otherwise,  R  contains  a  0.  With  this 
representation  of  the  fusion  rule,  Ralston  was  able  to  determine  P/)  and  P /,  |  with  the  following 
formulas: 


N 

P„=^Pr{j  |  H)  ■  R(j) 

i= 1 
N 

r,A  n-nU) 

i= 1 


Ralston  then  defines  an  Identification  System  Operating  Characteristic  (ISOC)  curve  by  com¬ 
puting  the  likelihood  ratio  Pr{j  |  H}  /Pt{j  j  F}  for  each  system  output  state  j  ordering  the  likeli¬ 
hood  ratios  from  greatest  to  least.  The  output  state  with  the  highest  likelihood  ratio  is  the  most 
conservative  output  state  and  will  produce  the  best  possible  Pjj  for  its  P  //,| .  The  next  output  state 
is  a  combination  of  the  previous  state  with  the  next  most  conservative  state,  and  so  on.  By  plotting 
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the  Pd  and  P fa  values  for  these  successive  combinations  of  output  states  Ralston  is  able  to  provide 
the  optimal  combination  of  states  for  each  P  i  \  without  enumerating  all  2N  possible  output  state 
combinations.  This  is  analogous  to  a  ROC  curve  for  a  single  classifier  whose  decision  is  based  on 
L(x).  Instead,  Ralston  treats  i  as  a  single  output  state  instead  of  a  single  observation  [6]. 

2-4-4  ROC  Fusion.  Oxley  and  Bauer  presented  a  novel  approach  for  classifier  system 
evaluation  by  showing  that  it  is  possible  to  analytically  construct  the  ROC  curve  for  an  MCS  based 
on  certain  fusion  rules  (AND  and  OR)  using  only  data  from  the  ROC  curves  for  the  individual 
classifiers  in  the  system  [20] .  The  purpose  of  the  classifier  systems  researched  in  their  work  was  to 
determine  if  the  system  was  in  one  of  two  states,  (e.g.,  friendly  or  hostile).  Their  work  resulted  in 
four  primary  contributions. 

First  they  defined  the  difference  between  fusion  within  and  across  target  types.  A  system  of 
classifiers  that  are  fused  within  is  a  system  in  which  all  classifiers  are  trained  to  detect  a  particular 
type  of  target.  Thus,  they  share  the  same  prior  probability  of  detection.  Moreover,  there  are  only 
two  possibilities  for  truth  in  such  a  system.  Either  the  target  is  present,  or  it  is  not.  A  system  that 
is  fused  across  target  types  includes  classifiers  trained  to  detect  a  number  of  target  types.  Each 
of  these  types  of  targets  may  have  a  different  prior  probability  of  detection,  and  since  the  system 
seeks  different  types  of  targets,  it  can  accidentally  arrive  at  the  correct  conclusion  if  a  classifier 
seeking  target  type  A  incorrectly  detects  a  target  when  a  target  type  B  is  present.  For  reasons 
such  as  these,  an  across  system  may  be  more  difficult  to  analyze  than  a  within  system. 

The  second  contribution  was  the  derivation  of  formulas  for  Pd  and  Pp 4  for  logical  AND  and 
OR  rules  in  within  and  across  systems.  The  third  and  fourth  contributions  were  very  closely  related. 
Rather  than  a  traditional  definition  for  a  ROC  curve  (Pd  vs.  Pfa),  Oxley  and  Bauer  defined  a  ROC 
curve  as  the  maximum  value  of  Pd  for  each  possible  Pp 4  for  that  particular  classifier.  Although 
this  contribution  may  seem  trivial,  it  allowed  Oxley  and  Bauer  to  analytically  determine  the  ROC 
curves  for  logical  AND  and  OR  rules. 
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The  example  used  was  a  system  designed  to  solve  a  two-class  problem  in  which  there  were 
(a)  two-classifiers,  (b)  each  classifier  could  output  two  labels,  and  (c)  the  system  could  output 
two  labels.  However,  Chapter  3  of  this  document  shows  that  the  results  for  AND  and  OR  can  be 
extended  to  any  number  of  classifiers  and  labels. 
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III.  Research  Methodology  and  Derivation 


3. 1  Introduction 

This  chapter  provides  a  matrix  algebra  representation  for  evaluating  the  performance  of 
Boolean  fusion  rules  in  an  MCS  designed  for  two-class  classification.  The  representation  is  general 
in  that  it  accommodates  rules  for  fusing  within  and  across  target  types  and  that  it  allows  for  any 
number  of  classifiers,  each  of  which  can  output  any  number  of  labels. 

Assume  there  is  a  classifier  trained  to  detect  a  hostile  target.  Formally,  consider  a  set  of 
events  £,  which  can  be  divided  into  two  subsets.  One  subset  consists  of  instances  of  a  hostile  target 
(£/,  C  £).  The  other  subset  ( £f  C  £)  consists  of  objects  not  belonging  to  £/,.  and  corresponds  to 
friendly  objects.  A  sensor  S  maps  an  event  to  a  feature  set  X .  Thus,  each  feature  vector  £  is  a 
random  vector  since  is  is  the  image  of  the  random  variable  S.  Events  in  the  subset  £/,  map  to  a 
subset  of  X  called  Xh,  and  events  in  £f  map  to  Xf  C  X.  Let  x  be  a  threshold  set  (or  a  set  of 
parameters)  used  by  A(x,9  €  x)  to  map  each  feature  vector  to  a  label  set  C  =  {f,h}.  That  is, 
l  =  A(x,  6)  and  A(x,  9)  :  X  — >  C  for  each  9  €  x.  Figure  4  illustrates  this  process. 

3.2  Notation 

Throughout  the  discussion  assume  there  are  a  finite  number,  K,  of  classifiers,  and  each 
classifier  is,  in  fact,  a  family  of  classifiers  dependent  upon  a  parameter,  df.  €  0*..  Each  classifier 
Af.  is  coupled  with  a  sensor  Sk  which  maps  events  to  the  feature  set  Xf,,  and  outputs  a  label  in 


Event  SenSOr  Feature  Classifier  Labe, 
Set  Set  Set 

S  X  L 


Sf 

s 

xf 
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f  / 
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w 

/„ 

Figure  4:  Event,  Feature  and  Label  Sets  for  a  Single  Classifier  System. 
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Table  6:  Conditional  Performance  Matrix  for  a  Classifier  with  Two  Labels. 


Feature  Set  ( Truth ) 

Xk  £=  <^k,f 

Xk  £  %k,h 

1 

Pr  {Ak(xk)  1  |  Xk  £  ^k,f\ 

Pi  (x/c)  1  |  xk  ^  <^k,h\ 

Output 

2 

Pr {Ak{xk)  =‘2\xke  XkJ\ 

Pr  \Ak(xk)  =  2  \xke  Xk<h} 

Label 

mk 

Pr {Ak(xk)  =  mk  |  xk  G  XkJ\ 

Pr {Ak(xk)  =  mk  |  xk  G  Xk<h\ 

the  label  set  Ck.  The  cardinality  of  the  label  set  Ck  is  mk  (i.e.,  there  are  mk  labels).  A  feature 
vector  xk  G  Xk  can  indicate  a  friendly  or  hostile  target,  and  there  are  two  corresponding  subsets 
of  Xk-  X/,,h  is  the  subset  containing  feature  vectors  that  should  indicate  hostile  targets,  and  Xkj 
is  the  subset  containing  feature  vectors  that  should  indicate  friendly  targets.  A  classifier  output 
denoting  a  hostile  target  is  denoted  by  a  lower  case  h  (Ak(xk)  =  hk).  and  the  output  denoting  a 
friendly  target  is  denoted  by  a  lower  case  /  {Ak(xk)  =  fk).  The  event,  feature,  and  label  sets  for 
the  system  will  be  denoted  by  a  subscript  S,  not  to  be  confused  with  the  sensor  S. 

3.3  Classifier  Performance 

The  following  sections  provide  the  reader  with  some  tools  for  evaluating  classifier  performance. 
These  tools  are  applicable  to  any  classifier  designed  to  solve  a  two-class  problem;  however,  the 
examples  presented  are  for  a  classifier  that  only  outputs  two  labels  (/  and  h). 

3.3.1  Conditional  Performance  Matrix.  One  can  summarize  the  performance  of  a  classifier 
operating  at  a  particular  decision  threshold  9  in  terms  of  the  conditional  probabilities  in  Table  1 
by  recording  them  in  a  matrix  equivalent  to  the  performance  matrix  defined  by  Ralston  [22]. 
This  matrix  will  be  called  the  Conditional  Performance  Matrix  (CPM).  For  each  classifier  k  G 
{1,  2, ...,  K},  let  Ck  denote  the  CPM  corresponding  to  the  kth  classifier.  Table  6  shows  that  each 
column  corresponds  with  truth,  and  each  row  corresponds  with  a  particular  output  label. 

To  be  consistent,  the  first  row  should  correspond  with  the  friendly  label  and  the  last  row  should 
correspond  with  the  hostile  label.  Similarly,  the  first  column  should  correspond  with  instances  of 
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friendly  objects,  and  the  second  column  should  correspond  with  instances  of  hostile  targets.  If  this 
convention  is  kept,  the  last  row  of  a  2  x  2  CPM  identifies  the  P  fa  and  Pjj  for  the  classifier,  and 
the  set  of  CPMs  for  all  6  <E  0fc  can  be  used  to  construct  the  ROC  curve  for  that  classifier. 


Ck 


Pr  {Ak(Xk)  fk\%k  EE  ^k,f\  P^*{^A:  fk\^k  EE  Xk}i\ 

Pl'{^4fc(^fc)  —  hk\%k  EE  ^k,f\  Pr{/fc  —  h*k \%k  EE  <^k'h\ 


Pc,k  PM,k 
P  F A,  k  PD,k 


1  ~  PFA,k  1  ~  PD,k 

PfA  ,k  Pm 


Definition  III.l.  A  Conditional  Performance  Matrix  ( CPM)  for  a  classifier  k  is  an  mk  x  2 

matrix  in  which  the  columns  correspond  with  truth,  the  rows  correspond  with  the  classifier’s  output 
labels,  and  the  ( i,j )  cell  is  the  conditional  probability  of  the  classifier  outputting  label  i  when  the 
true  state  of  the  system  is  j.  The  sum  of  each  column  of  the  CPM  is  unity  (i.e.,  the  CPM  is  column 
stochastic) . 


3.3.2  Prior  Probabilities  Matrix.  Using  the  definition  of  conditional  probability, 


Py{A\B\ 


Pr{AnB} 
Pr  {B} 


one  can  compute  the  joint  probability  Pr-jAnl?}  by  simply  multiplying  the  conditional  probability 
by  the  a  priori  probability  Pr{R}.  Consequently,  one  can  multiply  each  column  of  the  CPM  by 
the  appropriate  a  priori  probability  to  determine  the  unconditional  probability  of  each  output 
state.  Since  the  two  subsets  Xk,h  and  Xkj  are  complementary,  the  probabilities  Pr{:rfc  €  Xk,h} 
and  Pr-jiTfe  €  Xkj\  are  also  complementary.  Thus,  one  could  multiply  the  first  column  of  the  CPM 
by  Prja;/.  €  Xkj}  and  the  second  column  by  Prja;/.  €  Xk,h\  to  determine  the  joint  probabilities  of 
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the  output  labels  coinciding  with  a  particular  true  state.  The  result  is  a  matrix  with  the  following 
construction. 

Pr  {Ak(xk)  =  fk  n  xk  e  Fk}  Pr  {Ak(xk)  =  fk  n  xk  e  Hk) 

Pr{Ak(xk)  =  hk  n  xk  e  Fk\  Pr{Afc(irfc)  =  hk  n  xk  e  Hk\ 

Let  ak  =  Pr-jiT/.  €  Hk\,  and  (1  —  ak)  :==  Pr-fir/.  €  Fk } .  Define  a  2  x  2  diagonal  matrix  pk  as 
follows. 

(1  -  ak)  0 

Pk  = 

0  ak 

The  matrix  pk  is  called  the  Prior  Probabilities  Matrix  for  classifier  k. 

Definition  III. 2.  The  Prior  Probabilities  Matrix  (PPM)  for  a  particular  type  of  target  is  a 
2x2  diagonal  matrix  in  which  the  (2, 2)  cell  is  the  probability  of  observing  that  type  of  target,  and 
the  (1.1)  cell  is  a  complementary  value.  Thus,  the  trace  of  a  PPM  is  unity. 
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3.3.3  Joint  Performance  Matrix.  Now  we  can  define  the  Joint  Performance  Matrix  (JPM) 
for  a  two-label  classifier  as  follows. 

Pr  {  Friendly  Label  n  No  Target }  Pr  {  Friendly  Label  n  Target  Present } 

Jk 

Pr  { Hostile  Label  n  No  Target }  Pr  { Hostile  Label  n  Target  Present } 

Pr {Ak(xk)  =  fk  Hxke  Fk\  Pr {Ak{xk)  =  fk  Hxk  G  Hk} 

Pr {Ak{xk)  =  hk  n  xk  G  Fk\  Pr{Hfc(a;fc)  =  hk  Cixk  e  Hk\ 

—  CkPk 

Pr  {Ak(xk)  =  fk \xk  G  XkJ]  Pr  {lk  =  fk\xk  G  XkA}  Pr{a;fc  €  XkJ}  0 

Pr{^4/c(^fc)  —  ^k\^k  Xktf}  Pr {lk  —  hk \xk  G  Xkk }  0  Pr^x/.  G  Xkk^ 

1  ~  Pfa.k.  1  ~  PD,k  (1  ~  ®k)  0 

If  \.k  PD,k  0  ak 

(1  -  ak)  (1  -  PpA.k)  OLk  (1  -  PD,k) 

(1  -  ak)  Pfa.u  ockPD,k 

The  events  associated  with  each  element  of  the  JPM  are  mutually  exclusive  and  exhaustive, 
and  the  probabilities  define  the  entire  set  of  outcomes  for  the  classifier  k,  or  the  probability  of  each 
region  shown  in  the  shaded  portion  of  Figure  5.  Note  that  the  sum  of  the  elements  of  Jk  equals 
one,  and  the  trace  of  a  square  JPM  represents  the  classification  accuracy  for  the  classifier. 

Definition  III. 3.  The  Joint  Performance  Matrix  (JPM)  for  a  classifier  k  is  a  rn k  x  2  matrix 
in  which  the  columns  correspond  with  truth,  the  rows  correspond  with  the  classifier’s  output  labels, 
and  cell  ( i,j )  is  the  probability  of  the  classifier  outputting  label  i  when  the  system  is  in  state  j.  The 
JPM  gives  the  pmbabilities  of  all  possible  outcomes  for  the  classifier.  One  can  construct  the  JPM 
from  the  CPM  and  PPM  using  the  formula  Jk  =  Ckpk,  and  the  sum  of  the  elements  of  the  JPM  is 
unity. 
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Figure  5:  Classification  Errors  for  a  Single  Classifier  System 
3-4  Across  Fusion  State  Probabilities 

Consider  a  system  made  up  of  K  classifiers  with  similar  constructions.  Reconsider  the  set  of 
events  £,  which  is  now  partitioned  into  k  subsets:  £/hk  consists  of  all  hostile  targets  of  type  k  (for 
all  k  6  K),  and  £ j  includes  all  events  that  are  not  elements  of  £h,i,  £h, 2,  or  £h,k-  Each  classifier 
in  the  system  seeks  different  types  of  hostile  targets  (e.g.,  one  is  trained  to  detect  trucks,  another 
is  trained  to  detect  artillery,  etc.)  The  decisions  from  each  classifier  are  sent  to  a  fusion  center  or 
combiner,  where  a  fusion  rule  is  applied  to  the  labels.  The  result  is  the  decision  for  the  classifier 
system  in  terms  of  the  system  label  set  £5  =  {/s,  h-s}-  Figure  6  shows  a  two-classifier  system  in 
which  each  classifier  can  output  two  labels  for  an  arbitrary  fusion  rule  R(l±,  12)- 

3-4-1  Joint  State  Probabilities  Matrix.  For  the  example,  each  classifier  has  4  possible 
output  states,  and  the  associated  probabilities  are  defined  in  their  JPMs.  Thus,  there  are  4  -  4  =  16 
possible  combinations  of  output  states  for  a  two-classifier  system  in  which  each  classifier  outputs 
two  labels.  In  general,  there  are 

K  K 

2  •  mk  =  2a  mk 
k= 1  k= 1 
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Figure  6:  Event,  Feature  and  Label  Sets  for  a  Two-Classifier  System  Combining  Decisions  Across 
Target  Types 


combinations  of  output  states.  Assuming  statistical  independence  between  the  individual  classifiers, 
one  could  compute  the  probability  of  a  particular  scenario  by  multiplying  the  probabilities  of 
the  associated  output  states.  One  way  of  accomplishing  this  is  to  enumerate  all  combinations  of 
individual  output  states  to  determine  the  probability  of  the  system  being  in  a  particular  state,  but 
a  mathematical  mechanism  for  determining  the  state  combinations  might  enable  the  application  of 
analysis  techniques  to  help  evaluate  the  performance  of  various  fusion  rules. 

The  Kronecker  product  is  an  operation  that  makes  this  possible.  The  Kronecker  product,  or 
tensor  product,  of  two  matrices  multiplies  each  element  of  one  matrix  by  each  element  of  the  other 
matrix  in  the  following  manner  [11].  Assume  A  and  B  are  2x2  matrices.  The  Kronecker  product 
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of  A  and  B  is 


an&n  anb12  012^11  ai2^i2 

anB  CI12B  diib-21  an&22  012^21  ^12^22 

A®  B  =  = 

a-2\B  CI22B  a2lb\l  a2lb\2  a22^11  0-22^12 

a  21&21  a2li>22  a22^21  a22^22 

Note  that  A  ®  B  consists  of  all  possible  products  of  an  A-matrix  entry  with  B-matrix  entry. 
Some  fundamental  properties  of  the  Kronecker  product  are  given  in  [4]  and  [11],  and  Van  Loan 
notes  the  widening  use  of  the  operation  and  lists  some  areas  where  Kronecker  product  research  is 
thriving:  signal  processing,  image  processing,  semidefinite  programming,  quantum  programming, 
and  fast  Fourier  transforms  [27].  The  Kronecker  product  is  defined  for  any  pair  of  matrices  of  any 
dimensions,  but  for  this  application  we  will  only  be  working  with  the  m*,  x  2  CPMs  and  JPMs. 

Given  the  JPMs  for  any  two  classifiers  (1  and  2)  seeking  different  types  of  targets,  the  Kro¬ 
necker  product  of  the  two  gives  an  mi  m2  x  4  matrix  in  which  each  element  contains  the  probability 
of  a  particular  output  state  combination.  This  matrix  is  called  the  Joint  State  Probabilities  Matrix 
(JSPM).  If  we  denote  the  complement  of  x  as  x  =  (1  —  x),  the  JSPM  for  two  2x2  JPMs  is  below. 

(1  —  ai)  (1  —  Pfaa  )  (1  —  0:2)  (1  —  Pm, 2)  a2  (1  ~  Pd, 2) 

Ji  <8>  J2  =  <8> 

(1  o\)  Pf.\.j  Oil  Pd, 1  (1  -  a2)  Pm, 2  o2Pd,2 

'>  1  P !  I  ,  O2P !  |  0.\P fA$1  a2.P D,2  a\P D,\OL2P fa,2  o-iP d,\°l2P d,% 

II 1  P fa :  1 0'l PfA;4  Hi  Pj  1.  /  02P1K2  oliPda&vPfa.z  a\P  Dp.a2PD,2 

OiPfA.  1O2P  FA. 2  01Pfa.  1  0.2P  D,  2  OiPda^PfA.S  OiPda^Pd:! 

O-iPfA.I^PfaM  ®iPfA.1(X2Pd,2  0iPD,i02PfA.2  OiPD,1&2Pd;2 

As  the  rows  and  columns  of  the  CPM  and  JPM  correspond  with  labels  and  truth,  respectively, 
the  rows  of  the  JSPM  correspond  with  specific  pairs  of  labels,  and  the  columns  correspond  with 
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Table  7:  Joint  State  Probabilities  Matrix  Composed  from  2x2  JPMs. 


Feature  Sets 

Xij  x  X2J 

*1 ,/ 

x  X2  ,h 

Xl.h  X  X2J 

Xi.h  x  X2 ,h 

/ii  fi 

tyjP  fa  .  i  tA2  P  fa.  2 

a±P  fa 

1  Ot2P  D.-2 

OllP  D.1O12P  FA.2 

Ot\P  D,1^2P  D.2 

Label 

/l, 

ajP  fa.i^Pfa.z 

aTP  fa 

1  0/2Pd,2 

(AlP D.1O12P FA.  2 

alPD,la2PD,2 

Sets 

hi,f2 

tyjP  fa  .  i  tA2  P  fa  .  g 

ajP  fa 

i  ot2P n,2 

al  Pd.I^P  FA.  ‘2 

(AiPd.i^P  D.2 

hi,h2 

Oc[P  FA.1&-2PfA,2 

ajP  fa 

i  a2^'n,2 

aiPD,ia2PFA.s 

alPD,la2PD,2 

specific  occurrences  in  truth,  or  Cartesian  products  of  the  different  feature  sets.  Table  7  illustrates 
this  point. 

Definition  III. 4.  The  Joint  State  Probabilities  Matrix  (JSPM),  denoted  Sj,  for  a  system  of 
classifiers  { 1 , 2, . . . ,  K }  seeking  K  different  targets  is  an  Ilfc=i  TOfc  x  2K  matrix  in  which  the  columns 
correspond  with  truth,  the  rows  correspond  with  combinations  of  output  labels  from  the  individual 
classifiers,  and  cell  ( i,j )  gives  the  probability  of  the  classifier  system  outputtmg  the  combination 
of  labels  i  when  the  true  state  of  the  system  is  j .  The  JSPM  gives  the  probabilities  of  all  possible 
states  for  the  classifier  system,  and  the  sum  of  the  elements  of  the  JSPM  is  unity. 

Lemma  III.l.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying 
a  different  type  of  target.  Assuming  statistical  independence  between  the  individual  classifiers,  the 
JSPM  is  formed  by  the  Kronecker  product  of  the  JPMs  for  each  classifier  in  the  system. 

Proof  III.l.  Let  Ik  be  a  generic  output  label  for  each  classifier  k  e  K,  and  let  Xk.true  be  a  generic 
true  state  relative  to  the  target  sought  by  classifier  k.  Since  the  classifiers  are  statistically  indepen¬ 
dent  and  each  classifier  seeks  a  different  type  of  target,  the  probability  for  a  given  combination  of 
labels  and  truth .  is 


PljZl  X  Z2  X  •  •  •  X  l  kC  n  At  I  .true  X  A"2,true  X  •  •  •  X  A?K,trUe\  —  P  1  C  X\^frue  X  •  •  •  X  /  f.  Cl  A?K,true\ 

Pl'{/i  C  X\  ttrue\  '  •••  '  C  A-  K  .trap. } 

K 

=  II  Pl’{^fe  n  Xk’irue}  (!) 

k= 1 
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Note  that  each  of  the  terms  in  the  product  defined  in  Equation  1  is  an  element  of  a  different  JSPM, 
Jk .  Since  the  Kronecker  product  of  several  matrices  consists  of  all  possible  products  of  the  entries 
of  the  individual  matrices,  the  product  in  1  must  be  an  element  of  the  Kronecker  product  defined  by 

Ji  8  J-2  8  •  •  •  8  Jk- 

Furthermore,  it  can  be  shown  that  the  sum  of  the  elements  of  a  matrix  formed  from  the 
Kronecker  product  of  several  JPMs  ( or  for  that  matter,  any  matrices  whose  elements  sum  to  one ) 
is  unity.  Given  Cjjxe  and  FqxE,  the  Kronecker  product  Cjjxe  8  Fgxh  is  a  DG  x  EH  matrix 
whose  elements  represent  each  combination  of  the  cells  in  C  and  F  (i.e.,  cdefgh  for  any  allowable 
values  of  d,e,g,  and  h).  The  sum  of  the  elements  can  be  written 

D  E  G  H 

Sum  of  the  Elements  in  CDxE  8  FGxH  =  ^  ^  ^  ^  cdefgh 

d=  1  e=l  g=  1  h= 1 
D  E  G  H 

“EE6*  EE  a*  P) 

d=  1  e=l  g=  1  h= 1 

Recall  that  the  sum  of  the  elements  in  C  and  F  is  also  unity.  That  is,  J2d=i  cde  =  1  and 
Y^Cg'=i  fgh  =  1-  Inserting  these  results  into  Equation  2  gives  the  simple  result  that  1(1)  =  1. 
Since  the  elements  of  J\  g)  J-2  8  •  •  •  8  Jk  correspond  with  the  appropriate  values  in  the  JSPM  1,  and 
since  J\  J2  8  •  •  •  8  Jk  satisfies  the  properties  of  a  JSPM  2,  J\®  J-2®  ■  ■  ■  ®  Jk  must  be  a  JSPM. 

3.4.2  Combined  Prior  Probabilities  Matrix.  Note  that  the  JSPM  is  equal  to  J±  ®  J2  ® 
...  ®  Jk  =  Cipi  ®  C2P2  &•••<£>  C2p-2-  One  property  of  the  Kronecker  product  is  that  AB  ®  CD  = 
(^4  ®  C)  (B  ®  D )  for  any  matrices  A,  B,  C  and  D  [11].  Thus,  the  JSPM  can  be  decomposed  into 
the  matrix  product  of  C\  ®  C2  8  •  •  •  8  Ck  and  p\  ®  p2  <8  •  •  •  8  Pk-  We  will  call  the  matrix  composed 
by  pi  <8  p-2  8  •  •  •  <8  Pk  the  Combined  Prior  Probabilities  Matrix  (CPPM)  and  represent  it  with  P. 
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A  CPPM  for  a  system  in  which  two  classifiers  seek  two  types  of  targets  is  given  by 


(1  -  ai)  (1  -  a2)  0  0  0 


P  =  pi  <g>  p2  = 


0  (1  —  Q'i)o;2  0  0 

0  0  cx\  (1  —  a-2)  0 


0  0  0  a.ia-2 


The  elements  of  P  represent  the  a  priori  probabilities  of  each  of  the  feature  set  combinations  defined 
by  the  columns  of  the  JSPM. 

Definition  III. 5.  The  Combined  Prior  Probabilities  Matrix  (CPPM)  for  a  system  of  clas¬ 
sifiers  {1,2, A'}  seeking  K  different  targets  is  a  2K  x  2K  diagonal  matrix  in  which  the  diagonal 
elements  (j,j)  give  the  a  priori  probability  of  the  true  state  combinations  defined  in  the  jth  column 
of  the  JSPM.  The  trace  of  a  CPPM  is  unity. 

Lemma  III. 2.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying 
a  different  type  of  target.  Assuming  statistical  independence  between  the  individual  classifiers,  the 
CPPM  is  formed  by  the  Kronecker  product  of  the  PPMs  for  each  classifier  in  the  system. 

Proof  III. 2.  The  pi'oof  is  similar  to  the  proof  of  Lemma  III.l. 


3-4-3  Conditional  State  Pi'obabilities  Matrix.  Recall  that  the  JSPM  for  an  MCS  in  which 
each  classifier  seeks  a  different  type  of  target  is  given  by 


Js  —  Ji  ®  Ji  ®  <8>  Jk 

=  Cipi  (g>  C-2P-2  <8>  ...  ®  C2P2 

=  (Cl  <g>  C2  <S> ...  ®  Ck )  {pi  ®  P2  ®  ...  ®  Pk)  ■  (3) 

The  matrix,  Cl  ®  C2  <8>  ...  <8  Ck,  is  called  the  Conditional  State  Probabilities  Matrix  (CSPM)  and 
will  be  represented  with  Sc-  The  JSPM  ( ScP )  will  henceforth  be  denoted  Sj.  The  relationship 
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Table  8:  Conditional  State  Probabilities  Matrix  Composed  from  2x2  CPMs. 


Feature  Sets 

J  X  %2J 

X\  J  X  X2,h 

X\,h  X  %2 ,/ 

X\  ,h  X  X-2  ,h 

/l)  f'2 

P  FA .  1  P  FA .  2 

P  FA  .  1  P  D,  2 

Pd,\P FA.  2 

Pd,iPd,2 

Label 

fi,h-2 

P  fa.iPfa.2 

P  fa,iPd,% 

PdaPfa.2 

Pd,iPd,2 

Sets 

hi,  f'2 

P  FA.l  P  FA.  2 

P FA.l  P D, 2 

Pd,\P  fa. 2 

Pd,\Pd,2 

hi,h2 

P  FA.  1  P  FA.  2 

P FA.l  Pd, 2 

PdaPfa.2 

Pd,iPd,2 

between  the  CSPM  and  the  CPM  is  analagous  to  the  relationship  between  the  JSPM  and  the  JPM. 
The  columns  of  Sc  correspond  with  combinations  of  feature  sets,  and  the  rows  correspond  with 
combinations  of  labels.  Table  8  gives  an  example  of  a  CSPM. 

Definition  III. 6.  The  Conditional  State  Probabilities  Matrix  (CSPM)  for  a  system  of 
classifiers  { 1 , 2, . . . ,  K }  seeking  K  different  targets  is  an  rifc=i  mfc  x  matrix  in  which  the  columns 
correspond  with  truth,  the  rows  correspond  with  combinations  of  output  labels  from  the  individual 
classifiers,  and  cell  ( i,j )  gives  the  conditional  probability  of  the  classifier  system  outputting  the 
combination  of  labels  i  when  the  true  state  of  the  system  is  j.  The  sum  of  the  elements  in  each 
column  of  the  CSPM  is  unity. 

Lemma  III. 3.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying 
a  different  type  of  target.  Assuming  statistical  independence  between  the  individual  classifiers,  the 
CSPM  is  formed  by  the  Kronecker  product  of  the  CPMs  for  each  classifier  in  the  system. 

Proof  III. 3.  The  proof  is  similar  to  the  proof  of  Lemma  III.l.  However,  one  must  show  that  the 
Kronecker  product  of  several  stochastic  matrices  is  also  stochastic.  This  is  a  known  result  for  the 
Kronecker  product  [27]. 


3-4-4  Truth  Matrix.  Recall  the  4x4  matrices  presented  in  Tables  7  and  8,  and  note  that 
the  rightmost  three  columns  correspond  with  the  presence  of  at  least  one  type  of  target.  Further 
recall  that  the  goal  of  the  system  is  to  determine  if  any  targets  are  present.  In  the  instance  of 
any  of  the  state  combinations  on  these  columns,  at  least  one  type  of  target  is  present.  Since 
the  states  in  the  JSPM  are  mutually  exclusive,  the  probability  of  either  of  two  states  occurring 
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Table  9:  Result  After  Post-Multiplying  a  JSPM  by  a  Truth  Matrix. 


Truth 

No  Target 

Target  Present 

fl,  f'2 

Sj,  (i,P 

s  J,  (1,2)  +  Sj,  (1,3)  +  'S' J, (1,4) 

Label 

f\M 

£7, (2,1) 

*Sj,(2,2)  +  S  J,(2,3)  +  <Sj,(2,4) 

Sets 

hi,  f‘2. 

■S' J,  (3,1) 

*Sj,  (3,2)  +  'Sj,  (3,3)  +  <Sj,(3,4) 

hi,h2 

■S,/.;  i.i) 

*Sj,  (4,2)  +  Sj, (4,3)  +  <Sj,(4,4) 

is  the  sum  of  the  probabilities  of  the  two  states.  Therefore,  we  can  add  the  last  three  columns 
(SM.,  2)  +  <Sj,(.,3)  +  <Sj,{.,4))  to  arrive  at  the  probability  of  a  target  being  present  under  each  possible 
label  set.  Conversely,  the  sum  of  the  first  column  gives  us  the  probability  of  no  targets  present. 
We  can  calculate  both  of  these  values  by  post-multiplying  Sj  by  a  truth  matrix  T,  which  takes  the 
form 


1  0 


T  = 


0  1 
0  1 


0  1 


Now  we  are  left  with  a  4  x  2  matrix  in  which  the  rows  correspond  with  each  possible  combination  of 
labels,  the  first  column  corresponds  with  the  absence  of  targets,  and  the  second  column  corresponds 
with  the  presence  of  a  target  (or  targets).  Table  9  illustrates  the  resulting  matrix  for  the  2x2 
example.  The  cells  of  the  matrix  shown  in  Table  9  give  the  probabilities  of  a  particular  label  when 
the  system  is  in  a  particular  state  (target  or  not).  For  example,  cell  (3,  2)  gives  Pr{ Target  Present 

n  (huf2)}. 


Definition  III. 7.  A  Truth  Matrix  T  for  an  MCS  combining  decisions  across  target  types  is  a 
2k  x  2  matrix  containing  binary  values  in  which  row  i  corresponds  with  a  column  in  the  JSPM, 
the  first  column  corresponds  with  the  absence  of  hostile  targets,  and  the  second  column  corresponds 
with  the  presence  of  hostile  targets.  The  columns  of  the  Truth  Matrix  must  be  orthogonal.  That  is, 
if  the  ith  column  of  the  JSPM  corresponds  with  the  presence  of  at  least  one  target,  the  (i,  2)  cell  of 
the  Truth.  Matrix  will  contain  a  1.  Otherwise,  the  (i,  1)  cell  of  the  Truth  Matrix  will  contain  a  one. 
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Lemma  III. 4.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying  a 
different  type  of  target.  The  truth  matrix  for  such  a  system  always  has  a  one  in  the  (1,1)  cell  and 
zeros  in  all  other  cells  of  the  first  column.  Consequently,  the  truth  matrix  also  always  has  a  zero  in 
the  (1.2)  cell  and  ones  in  the  remaining  cells  of  the  second  column. 

Proof  III. 4.  Assuming  the  CPMs  for  the  individual  classifiers  were  built  with  the  previously  de¬ 
fined  convention  (i.e.,  the  first  column  corresponds  with  the  absence  of  a  target,  the  second  column 
corresponds  with  the  presence  of  a  target,  and  the  last  row  corresponds  with  the  hostile  label),  the 
only  column  of  the  JSPM  corresponding  with  a  complete  absence  of  targets  must  be  the  first  column. 

3.4-5  Fusion  Rule  Matrix.  Recall  that  a  logical  fusion  rule  selects  combinations  of  labels 
for  which  the  system  concludes  that  a  target  is  present.  For  example,  the  AND  fusion  rule  will 
conclude  that  a  target  is  present  only  if  all  classifiers  in  the  ensemble  conclude  that  a  target  is 
present.  The  AND  rule  corresponds  with  only  the  last  row  of  the  matrix  in  Table  9 
Since  the  AND  rule  will  lead  the  system  to  conclude  a  target  is  present  if  and  only  if  all  classifiers 
determine  a  target  is  present,  AND  is  a  generally  a  conservative  rule,  which  makes  a  false  alarm 
less  likely  but  also  gives  a  lower  probability  of  detection.  A  visual  depiction  of  the  results  of  the 
AND  rule  for  the  previous  example  is  given  in  Figure  7. 

The  OR  rule  will  decide  that  a  target  is  present  if  any  of  the  classifiers  identify  a  target.  The 
OR  rule  corresponds  with  rows  two  through  four  of  the  matrix  in  Table  9,  or  the  intersection  of  the 
label  sets  (/1,  hf)  U  (hi,  ff)  U  (hi,  /12).  The  OR  rule  is  more  aggressive  than  the  AND  rule,  usually 
yielding  a  higher  false  alarm  rate  as  well  as  a  higher  detection  probability.  An  illustration  of  the 
results  of  the  OR  rule  for  the  previous  example  is  given  in  Figure  8. 

Once  again  we  use  the  fact  that  the  events  defined  by  the  cells  of  the  matrix  in  Table  9  are 
mutually  exclusive.  If  we  want  to  determine  the  probability  of  identifying  a  target  under  a  particular 
fusion  rule  we  can  add  the  appropriate  cells  from  the  second  column  If  we  want  to  determine  the 
probability  of  identifying  a  target  when  none  exists  we  can  add  the  corresponding  three  cells  from 
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Sensors  Classifiers  Fusion 


Feature  Individual 

Sets  Label  Sets 


Figure  7:  Classification  Errors  for  a  Two-Classifier  System  in  Which  Decisions  are  Combined 
Using  the  Logical  AND  Rule. 


32 


Sensors 


Classifiers  Fusion 


1M  Member  of  Xj  Labeled  h 


I  I  Member  of  Xh  Labeled/ 


□  Correct  Classification 


Figure  8:  Classification  Errors  for  a  Two-Classifier  System  in  Which  Decisions  are  Combined 

Using  the  Logical  OR  Rule. 
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the  first  column.  Recall  that  the  V  rule  corresponds  with  the  last  three  rows  of  the  matrix  in  Table 
9.  Now  we  can  define  a  fusion  vector  as  a  column  vector  of  zeros  and  ones,  the  ones  corresponding 
to  the  rows  appropriate  for  that  rule.  The  fusion  vector  for  the  OR  rule  for  our  2x2  example  is 

0 
1 

roR  = 

1 
1 

This  fusion  vector  is  similar  to  the  “rule  of  engagement”  defined  in  [22]  and  the  rule  vectors  (for 
three  classifiers)  defined  in  [?]. 

Pre-multiplying  the  matrix  in  Table  9  (which  was  computed  with  the  formula  ScPT )  by 
the  transpose  of  the  fusion  vector  (to  preserve  dimensionality)  gives  us  a  vector  containing  the 
probabilities  of  correctly  identifying  a  hostile  target  and  misclassifying  a  friendly  object,  and  pre¬ 
multiplying  ScPT  by  the  complementary  vector  to  tor,  t%r  =  (1,0, 0, 0),  gives  us  the  probabilities 
of  misclassifying  a  hostile  target  and  correctly  identifying  a  friendly  object. 

rTORScPT=  Pr{/is  n  No  Target}  Pr{/ign  Target  Present} 

and 

Pi' {/.S’  H  No  Target}  Prj/sfl  Target  Present} 

When  the  two  vectors  of  are  augmented  in  the  form  [roR'roR.},  the  result  is  the  fusion  rule  matrix 
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1  0 


01 

Ron  =  \>‘on->‘on:  = 

0  1 
0  1 

A  more  general  definition  is  given  by  the  following. 

Definition  III. 8.  A  Fusion  Rule  Matrix  R  is  a  2K  x  mg  matrix  containing  binary  values  in 
which  row  i  corresponds  with  a  row  in  the  JSPM,  the  first  column  corresponds  with  combinations  of 
output  labels  for  which  the  system  concludes  there  is  no  target  present,  the  last  column  corresponds 
with  combinations  of  output  labels  for  which  the  system  concludes  a  hostile  target  is  present,  and  the 
columns  in  between  (if  mg  >2)  correspond  with  intermediate  fuzzy  labels.  For  example,  if  the  system 
is  to  conclude  that  a  hostile  target  is  present  for  a  combination  of  output  states  corresponding  with 
the  ith  row  of  the  JSPM,  the  ( i,mg )  cell  of  the  Fusion  Ride  Matrix  will  contain  a  1.  The  columns 
of  a  Fusion  Ride  Matrix  must  be  orthogonal. 

3.4-6  System  Joint  Performance  Matrix. 

Theorem  III.l.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying 
a  different  type  of  target.  If  the  classifiers  are  statistically  independent,  the  system  JPM,  Jg,  can 
be  computed  with  the  formula  RT Sc PT,  where  Sc  and  P  are  computed  using  Lemmas  III.  3  and 
III.  2. 

Proof  III.l.  Recall  the  proofs  of  Lemmas  III. 3  and  III. 2  and  the  result  that  Sj  =  ScP ■  Then 
pre-  and  post-multiplying  Sj  by  RT  and  T,  respectively,  simply  computes  the  sums  of  appropriate 
mutually  exclusive  probabilities.  These  sums  correspond  to  the  appropriate  probabilities  summarized 
in  the  JPM. 

•  The  (1,1)  cell  is  equivalent  to  RfiSjTi,  which  computes  the  sum  of  the  cells  of  Sj  that  (a) 
are  labeled  friendly  (by  the  rule  matrix),  and  (b)  are  actually  friendly  (by  the  truth,  matrix). 
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•  The  (1,2)  cell  is  equivalent  to  RfSjT2,  which  computes  the  sum  of  the  cells  of  Sj  that  (a) 
are  labeled  friendly,  and  (b)  are  actually  hostile. 

•  The  (2,1)  cell  is  equivalent  to  RffSjTi,  which  computes  the  sum  of  the  cells  of  Sj  that  (a) 
are  labeled  hostile,  and  (b)  are  actually  friendly. 

•  The  (2,2)  cell  is  equivalent  to  Rf>SjT2,  which  computes  the  sum  of  the  cells  of  Sj  that  (a) 
are  labeled  hostile,  and  (b)  are  actually  friendly. 

The  result  is  a  matrix  with  the  following  contstruction: 

Pr{^s(zs)  =  lna:s  €  Xs ,/}  Pr{As(xs)  =  1  n  xs  6  Xs,h } 

RtScPT  =  :  j 

Pr{4s(zs)  =  ms  l~l  xs  e  XSj}  Pr{As(xs)  =  ms  nis€  Xs>h } 
which  satisfies  the  definition  of  a  JPM. 


3.4-7  System  Prior  Probabilities  Matrix.  Recall  that  the  JPM  for  a  classifier  k  can 
be  computed  with  Jk  =  Ckpk ■  Thus,  one  can  post-multiply  Js  by  the  inverse  of  the  system 
prior  probabilities  matrix  to  compute  Cs,  but  pg  has  not  yet  been  computed.  Recall  the  matrix 
P  =  p\  ®  P2  <S>  •  •  •  <8>  Pk,  or  in  terms  of  the  example 


P  =  pi  <8  p2 


(1  -  ai)  (1  -  a2)  0  0  0 

0  (1  —  Qi)a2  0  0 

0  0  0-1(1  —  0-2)  0 

0  0  0  a\Oi2 


Theorem  III. 2.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying 
a  different  type  of  target.  The  system  PPM,  ps,  can  be  computed  by  pre-multiplying  the  CPPM  by 
the  transpose  of  the  truth  matrix  Tt  and  post-multiplying  by  the  truth  matrix  T . 


36 


Proof  III. 2.  Recall  that  each  row/column  of  the  diagonal  CPPM  corresponds  with  a  particular 
combination  of  true  events  (i.e.,  the  Cartesian  product  of  two  particular  feature  sets)  defined  by 
the  columns  of  the  CSPM  and  JSPM.  The  last  2R  —  1  rows/columns  of  the  CPPM  correspond  with 
any  event  where  a  target  is  present,  and  the  first  row/column  corresponds  with  instances  where  no 
target  is  present.  Pre-  and  post-multiplying  by  the  truth,  matrix  T  computes  the  sums  of  appropriate 
mutually  exclusive  probabilities.  These  sums  correspond  to  the  appropriate  probabilities  summarized 
in  the  PPM. 

•  The  (1,1)  cell  is  equivalent  to  TfiPTi,  which  computes  the  sum  of  the  cells  of  P  that  coincide 
with  the  absence  of  a  target. 

•  The  (1,2)  and  (2,1)  cells  are  equivalent  to  Tfi PT-2  and  T^PTi,  respectively.  The  result  for 
either  is  always  zero,  since  the  columns  of  T  are  oHhogonal. 

•  The  (2,2)  cell  is  equivalent  to  Tf)  PT%,  which  computes  the  sum  of  the  cells  of  P  that  coincide 
with  the  presence  of  at  least  one  target. 

The  result  is  a  matrix  with  the  following  construction, 

PrjlVo  Target}  0 

TtPT  = 

0  Pr{  Target  Present} 

which  satisfies  the  definition  of  a  PPM. 

3.f.8  System  Conditional  Performance  Matrix.  Using  the  previously  developed  formula, 
one  can  now  compute 
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Pr{As(a;s)  :•  -  I  xs  C  XSj\  Pr{4s(a:s)  =  1  |  xs  €  Xs,h\ 


Cs  =  JsPs 1  =  RT ScPTPs 1  = 


Pr{4s(zs)  =  ms  \  xs  €  XSj}  Pr{As(a;s)  =  ms  \  xs  €  Xs,h} 

(4) 


3-4-9  Summary.  This  section  provided  definitions  for  the  CPM,  JPM,  PPM,  JSPM, 
CPPM,  CSPM,  Truth  Matrix,  and  Fusion  Rule  Matrix.  Moreover,  this  section  contained  the 
derivation  of  a  formula  for  computing  the  system  PPM,  CPM,  and  JPM  for  an  MCS  in  which 
each  individual  classifier  is  charged  with  identifying  a  different  type  of  target  and  the  individual 
decisions  are  combined  using  Boolean  fusion  rules. 


3.5  Within  Fusion  State  Probabilities 

Consider  a  system  made  up  of  K  classifiers,  each  trying  to  detect  the  same  type  of  target. 
The  set  of  events  £  is  partitioned  into  2  subsets:  £/,.  consists  hostile  targets  and  £f  includes  all 
events  that  are  not  elements  of  £h-  Assume  that  the  CPM  and  JPM  are  available  for  each  classifier, 
and  note  that  the  classifiers  all  share  the  same  PPM  since  they  all  seek  the  same  type  of  target. 
One  would  like  to  be  able  to  compute  the  system  CPM,  PPM,  and  JPM  as  before,  but  this 
scenario  possesses  some  properties  that  necessitate  some  changes  in  the  computations  of  the  state 
probabilities  matrices  (CSPM  and  JSPM)  and  the  CPPM.  The  decisions  from  each  classifier  are 
sent  to  a  fusion  center  or  combiner,  where  a  fusion  rule  is  applied  to  the  labels.  The  result  is  the 
decision  for  the  classifier  system  in  terms  of  the  system  label  set  £g  =  {1,  2, ...,  mg}.  Figure  9 
shows  a  two-classifier  system  in  which  each  classifier  can  output  two  labels  for  an  arbitrary  fusion 
rule  R. 
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Figure  9:  Event,  Feature  and  Label  Sets  for  a  Multiple  Classifier  System  Combining  Decisions 
Within  Target  Types. 

3.5.1  Conditional  State  Pi'obabilities  Matrix.  Reconsider  the  CSPM  for  the  system  of 
classifiers  fused  across  target  types  appearing  in  Table  8.  This  was  the  simplest  example  of  a  4  x  4 
CSPM,  but  it  is  adequate  for  illustrating  the  difference  between  the  two  cases.  The  second  and 
third  columns  of  the  CSPM  in  Table  8  correspond  with  events  where  one  type  of  target  is  present 
and  the  other  type  is  not.  A  scenario  such  as  this  is  impossible  for  the  system  presently  being 
considered,  because  all  the  classifiers  are  seeking  the  same  type  of  target  (i.e.,  there  are  only  two 
possible  true  states).  One  can  use  the  Kronecker  product  to  compute  the  possible  combinations  of 
the  elements  from  the  CPMs,  but  the  result  must  be  modified  to  account  for  these  impossibilities. 
A  simple  way  of  removing  them  is  to  post-multiply  the  Kronecker  product  of  the  CPMs  by  a  2K  x  2 
matrix  with  ones  in  the  (1,1)  and  (2K ,  2)  cells.  The  result  is  a  2A  x  2  matrix  consisting  of  the  first 
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and  last  columns  of  the  Kronecker  product  term.  For  example, 

1  0 
0  0 

Sc.Within  ~  (Cl  ®  C'l) 

0  0 
0  1 


1  0 
0  0 
0  0 
0  1 

P  fa,iP  fa. 2  Pfa.iPd,-2  P  d,iP  fa, 2  Pd,iPd;2  1  0 

Pfa.iPfa,z  Pfa.iPd,-2  Pd,iPfa,z  Poa  Pn,-2  0  0 

PpA.lPpA PfA.iPd,2  Pd,\PfA.3  Pf>.\  P  D,2  0  0 

Pfa.iPfa.2  Pfa,iPd,2  PdaPfa.2  Pn,\  Pn.i  0  1 

P FA.J  P FA. 2  P D,\Pd,2 

PfA.iPfA.2  P F>A  P/),-2 
PfA.1  P  FA. 2  Pd,\Pd,2 
PfA,1  PfA.2  PdaPd,2 

Definition  III. 9.  The  Conditional  State  Probabilities  Matrix  ( CSPM)  for  a  system  of  clas¬ 
sifiers  { 1 , 2 , . . . ,  K }  seeking  one  type  of  target  is  a  2R  x  2  matrix  in  which  the  first  column  corresponds 
with  instances  where  the  target  is  absent,  the  second  column  corresponds  with  instances  where  the 
target  is  present,  the  ro  ws  correspond  with  combinations  of  output  labels  from  the  individual  classi¬ 
fiers,  and  cell  ( i,j )  gives  the  conditional  probability  of  the  classifier  system  outputting  the  combina¬ 
tion  of  labels  i  when  the  true  state  of  the  system  is  j  (i.e.  Pr{Ag(a;ig)  =  (Zi,  1-2, Ik)  I  %s  G  Xs,j})- 
The  sum  of  the  elements  in  each  column  of  the  CSPM  is  unity. 


40 


Lemma  III. 5.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying 
the  same  type  of  target.  Assuming  statistical  independence  between  the  individual  classifiers,  the 
CSPM  is  formed  by  post-multiplying  the  Kronecker  product  of  the  CPMs  for  each  classifier  in  the 
system  by  a  2K  x  2  matrix  with  ones  in  the  (1,1)  and  (2K ,  2)  cells. 

Proof  III. 5.  Let  Ik  be  a  generic  output  label  for  each  classifier  k  G  K,  and  let  Xtrue  be  a  generic 
true  state  relative  to  the  target  sought.  Since  the  classifiers  are  statistically  independent,  the  prob¬ 
ability  for  a  given  combination  of  labels  and  truth  is 


Pr{/-|  xl2x  ...  x  lK  \  x\  x  x2  x  ...  x  xK  G  Xtrue} 


Pr{^i  X  l2  X  •••  X  Ik  0  X%  X  X-2  X  ...  X  Xk  G  Xtrue} 
Prjiri  x  x-2  x  ...  x  xK  G  Xtrue} 

Pi{/i  Hi  x\  g  Xtruef  •  ...  •  Pi{Ik  C  •vk  G  Xfruef 
Prjlci  G  Xjrue  |  ...  Py{xk  G  Xfrucf 

Pi{/i  |  X\  G  Xfrue )  ■  ...  ■  Prj/^'  |  X} ^  G  Xj!uf,  | 

K 

Pr{^fc  |  xk  G  Xj.  (5) 

fc=i 


Note  that  each  of  the  terms  in  the  product  defined  in  Equation  5  is  an  element  of  a  different  CPM. 
Since  the  Kronecker  product  of  several  CPMs  consists  of  all  possible  products  of  the  entries  of  the 
individual  CPMs,  the  product  in  Equation  5  must  be  an  element  of  the  Kronecker  product  defined 
by  C\  ®  C-2  <8>  ...  <8>  Ck ■  Post-multiplying  the  result  by  a  2K  x  2  matrix  with  ones  in  the  (1, 1)  and 
(2k,2)  cells  leaves  only  the  first  and  last  columns,  the  columns  corresponding  with  the  target  being 
absent  or  present,  respectively. 


3.5.2  Combined  Prior  Probabilities  Matrix.  In  a  within  fusion  system,  computation  of 
the  CPPM  is  trivial.  Note  that  the  CSPM  is  2K  x  2.  Therefore,  one  needs  to  post-multiply  the 
CSPM  by  a  2  x  2  CPPM  to  arrive  at  a  properly  dimensioned  JSPM.  Since  each  classifier  seeks  the 
same  type  of  target  the  a  priori  probability  of  the  system  being  in  the  hostile  state  is  the  same  as 
the  a  priori  probabilities  for  all  the  classifiers  in  the  system.  Thus,  the  2x2  matrix  required  is 
simply  the  PPM  shared  by  the  individual  classifiers  in  the  system  (i.e.,  P  =  p). 
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Definition  III. 10.  The  Combined  Prior  Probabilities  Matrix  (CPPM)  for  a  system  of 
classifiers  (1,2,  ...,K)  seeking  one  type  of  target  is  equivalent  to  the  PPM  for  each  classifier  in  the 
system. 

3.5.3  Joint  State  Probabilities  Matrix.  The  JSPM  Sj  can  now  be  computed  with 

Sj  =  SCP 

Definition  III. 11.  The  Joint  State  Probabilities  Matrix  (JSPM)  for  a  system  of  classifiers 
{1, 2, K}  seeking  one  type  of  target  is  a  2K  x  2  matrix  in  which  the  first  column  corresponds  with 
instances  when  the  target  is  absent,  the  second  column  corresponds  with  instances  when  the  target  is 
present,  the  rows  correspond  with  combinations  of  output  labels  from  the  individual  classifiers,  and 
cell  ( i,j )  gives  the  probability  of  the  classifier  system  outputting  the  combination  of  labels  i  when 
the  true  state  of  the  system  is  j  (i.e.  Pr{As(a;,g)  =  (h,h, ... ,1k )  HisG  %s,j})-  The  sum  of  the 
elements  in  the  JSPM  is  unity. 

Lemma  III. 6.  Consider  an  MCS  in  which  each  individual  classifier  is  charged  with  identifying  the 
same  type  of  target.  Assuming  statistical  independence  between  the  individual  classifiers,  the  JSPM 
is  formed  by  post-multiplying  the  CPPM,  Sc,  by  the  CPPM,  P. 

Proof  III. 6.  Using  the  definition  of  conditional  probability,  one  can  post-multiply  a  matrix  with 
the  form 

Pr {Label  Combination  1  |  No  Target }  Pr {Label  Combination  1  \  Tm'get} 

Pr  {Label  Combination  2  |  No  Target}  Pr  {Label  Combination  2  [  Target} 

Pr  {Label  Combination  2R  \  No  Target}  Pr  {Label  Combination  2R  \  Tai'get} 
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by  a  matrix  with  the  form 


Pr{  No  Target}  0 

0  Pr{  Target} 


to  yield  a  matrix  with  the  form 


Pr{  Label  Combination  1  n  No  Target} 
Pr{  Label  Combination  2  n  No  Target} 

Pr{  Label  Combination  2K  n  No  Target} 


Pr {Label  Combination  in  Target} 
Pr {Label  Combination  2n  Target} 

Pr  {Label  Combination  2K  n  Target} 


3.5.f  Truth  and  Fusion  Rule  Matrices.  Using  the  definitions  above  for  a  within  system, 
the  JSPM  is  a  2a  x  2  matrix  in  which  each  column  corresponds  with  truth.  Thus,  the  truth  matrix 
is  no  longer  necessary  because  of  the  structure  of  this  special  case.  One  could  consider  the  2K  x  2 
matrix  used  to  compute  the  CSPM  (i.e.,  the  matrix  with  ones  in  the  (1,1)  and  (2^,2)  cells)  the 
truth  matrix,  because  its  purpose  is  to  eliminate  impossible  scenarios  leaving  only  the  two  possible 
states  (friend  or  hostile).  Fusion  rule  matrices  are  defined  in  exactly  the  same  way  as  in  an  across 
fusion  system,  and  the  system  PPM  is  equivalent  to  the  PPM  shared  by  the  individual  classifiers. 
The  formulae  for  computing  the  CPM  and  JPM  are  identical: 


Js  =  RtScPT  and 
Cs  =  JsPs1  ■ 


3.5.5  Summary.  This  section  provided  definitions  for  the  CPM,  JPM,  PPM,  JSPM, 
CPPM,  CSPM,  Truth  Matrix,  and  Fusion  Rule  Matrix  for  an  MCS  fusing  decisions  within  target 
types.  This  section  also  reiterated  the  formulae  for  computing  the  system  CPM  and  JPM  for  an 
MCS  in  which  the  individual  decisions  are  combined  using  Boolean  fusion  rules. 
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3.6  Estimating  ROC  Curves 


3.6.1  Overview.  This  section  suggests  methods  for  estimating  the  ROC  curve  using  the 
system  CPM  for  an  MCS  combining  decisions  within  or  across  target  types.  Oxley  and  Bauer’s 
method  of  ROC  fusion  can  be  applied  for  Logical  AND  and  OR  rules;  however,  other  techniques  are 
necessary  for  more  complex  rules  (e.g.,  a  majority  vote).  Liggins  showed  that  there  are  7  relevant 
rules  other  than  the  AND  and  OR  for  combining  the  decisions  of  three  classifiers  [18].  Moreover, 
as  the  number  of  classifiers  in  the  system  gets  larger,  there  are  even  more  relevant  rules  besides 
the  AND  and  OR  rules.  One  might  hope  to  find  a  way  to  estimate  the  ROC  curves  for  systems 
combined  using  more  complex  fusion  rules.  Of  these,  the  majority  vote  seems  to  be  the  most 
complex. 

3.6.2  ROC  Fusion. 

3.6.2. 1  Logical  AND.  One  can  employ  Oxley  and  Bauer’s  method  to  analytically 
estimate  the  system  ROC  curve  for  a  within  or  across  MCS  of  any  size  if  the  system  only  outputs 
two  labels  (friendly  or  hostile)  [20] .  The  key  to  their  formula  for  the  AND  rule  was  the  observation 
that,  under  an  AND  rule,  the  system  probability  for  assigning  a  hostile  label  is  equal  to  the  product 
of  each  of  the  individuals  assigning  a  hostile  label: 

Pr{  hs  \  =  Pr{  /?i }  ■  Pr{  h2 } 

((1  -  as )  Pfa.s  +  &sPd,s)  =  ((1  ~  cq)  Pfa.i  +  <^iPD,i)  ((1  —  ot 2)  /’/  i.  >  +  <a2Pd, 2)  ■ 

Maximizing  both  sides  of  the  last  equation  with  respect  to  the  individual  threshold  values  and 
manipulating  the  result  allows  one  to  derive  a  formula  for  the  maximum  Pd.s  for  a  given  Pfa.s- 
The  property  can  easily  be  adapted  to  account  for  any  number  of  classifiers  by  using  the  property 

K 

Pr  {hs}  =  n  Pr{/ife}. 

k= 1 


44 


S.6.2.2  Logical  OR.  Oxley  and  Bauer  derived  a  similar  formula  for  the  OR  rule. 
The  key  to  this  formula  was  the  observation  that,  under  an  OR  rule,  the  system  probability  for 
assigning  a  friendly  label  is  equal  to  the  product  of  each  of  the  individuals  assigning  a  friendly  label: 

Pr  {fs}  =Pr{/i}-Pr{/2} 

(1  —  as)  (1  -  Pfa.s )  +  (1  ~  Pd, s)  =  [(1  —  «i)  (1  —  Pfa.i)  +  Qi  (1  ~  Pd,i)] 

■  [(1  —  a2)  (1  —  Pfa.z)  +  cv2  (1  ~  Pd, 2)]  • 

Minimizing  both  sides  of  the  last  equation  with  respect  to  the  individual  threshold  values  and 
manipulating  the  result  gives  a  formula  for  min(— Pd,s)  which  is  equivalent  to  max  Pn,s-  This 
property  can  be  adapted  in  a  similar  manner  such  that 

K 

PAfs}  =  Y[P?{fk} 

k= 1 


3.6.3  Lagrangian  Optimization.  Another  method  for  estimating  the  system  ROC  curve  is 
to  use  a  Lagrangian  formulation  like  the  one  used  in  CFAR  applications  [28] .  A  typical  Lagrangian 
equation  L  =  f(x)  —  A(g(x))  is  appropriate  in  the  following  form: 

L  =  Pd,s  ~  HPfa,s  ~  p), 

where  p  is  any  allowable  false  positive  rate.  Differentiating  with  respect  to  the  threshold  values  (or 
P 'fa  )  and  the  Lagrange  multiplier  gives  a  system  of  nonlinear  equations  (when  K  >  3)  that  can  be 
solved  to  determine  the  maximum  Pd,s- 

3.6.4  Brute  Force.  If  no  other  option  is  available,  one  can  enumerate  a  subset  of  possible 
threshold  (or  Pi  \ )  combinations  and  use  the  frontier  of  the  results.  This  method  is  hardly  scientific, 
but  the  computational  complexity  is  not  such  that  the  method  is  impractical.  Depending  upon  the 
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complexity  of  the  fusion  rule  this  method  may  be  more  practical  than  the  others  A  drawback, 
however,  is  that  the  upper  bound  of  the  ROC  is  not  guaranteed. 

3. 7  Summary 

This  chapter  provided  the  following  definitions  for  MCSs  in  (a)  each  classifier  sought  different 
types  of  targets  and  (b)  each  classifier  sought  the  same  type  of  target:  Conditional  Performance 
Matrix  (CPM),  Joint  Performance  Matrix  (JPM),  Prior  Probabilities  Matrix  (PPM),  Conditional 
State  Probabilities  Matrix  (CSPM),  Combined  Prior  Probabilities  Matrix  (CPPM),  and  Joint  State 
Probabilities  Matrix  (JSPM).  Also  provided  were  derivations  of  formulae  for  the  system  JPM,  PPM, 
and  CPM  for  within  and  across  fusion  systems  in  which  decisions  are  combined  using  Boolean  fusion 
rules.  Lastly,  the  reader  is  presented  several  methods  for  estimating  the  ROC  curve  for  an  MCS  in 
which  decisions  are  combined  using  Boolean  fusion  rules. 
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IV.  Summary  and  Recommendations 


4.1  Overview 

This  chapter  summarizes  the  contributions  of  this  thesis  as  applied  to  Multiple  Classifier 
Systems  in  which  decisions  are  combined  using  Boolean  fusion  rules.  Additionally,  the  chapter  will 
suggest  areas  of  future  research. 

4-2  Summary  of  Contributions 

The  primary  contribution  of  this  thesis  is  a  matrix  algebraic  formula  for  computing  the 
Conditional  and  Joint  Performance  Matrices  for  a  Boolean  Multiple  Classifier  System.  This  thesis 
is  the  first  known  use  of  the  Kronecker  product  for  evaluating  classifier  system  performance.  Also 
presented  were  definitions  and/or  derivations  for  the  following: 

•  Conditional  Performance  Matrix  (CPM), 

•  Prior  Probabilities  Matrix  (PPM), 

•  Joint  Performance  Matrix  (JPM), 

•  Combined  Prior  Probabilities  Matrix  (CPPM), 

•  Conditional  State  Probabilities  Matrix  (CSPM), 

•  Joint  State  Probabilities  Matrix  (JSPM), 

•  Truth  Matrix,  and 

•  Fusion  Rule  Matrix. 

Furthermore,  several  methods  were  presented  for  estimating  an  upper  bound  of  the  ROC 
curve  for  the  MCS  using  the  system  CPM.  The  individual  CPMs  were  used  previously  to  determine 
optimal  fusion  rules  [22];  however,  that  work  did  not  take  into  account  the  possibility  of  varying 
the  decision  thresholds  for  the  individual  classifiers,  nor  did  that  work  provide  a  methodology  for 
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analyzing  systems  fusing  across  target  types.  Lastly,  this  thesis  characterizes  Sensor  Corroboration 
rules  that  were  considered  by  Liggins  [18]. 

4-3  Recommendations  for  Future  Research 

The  results  of  this  research  identify  several  potential  areas  for  further  research.  First,  the 
matrix  algebraic  formula  for  the  system  CPM  suggests  an  underlying  algebraic  structure  for  Mul¬ 
tiple  Classifier  Systems.  Perhaps  this  structure  can  be  extended  to  beyond  Boolean  MCSs  to  other 
types  of  MCSs  (e.g.,  weighted  voting  systems). 

Second,  future  analysts,  engineers,  and  mathematicians  may  be  able  to  exploit  the  algebraic 
structure  in  such  a  way  as  to  improve  classification  accuracy.  This  might  be  accomplished  by 
developing  more  clever  ways  to  maximize  detection  probability  for  a  given  false  alarm  rate  or 
through  some  other  means  entirely. 

Third,  this  work  took  advantage  of  a  commonly  (ab)used  assumption  of  statistical  indepen¬ 
dence,  even  though  statistical  independence  is  not  likely  for  MCSs  in  which  the  individuals  seek 
similar  target  types.  It  may  be  possible  to  incorporate  variance/covariance  matrices  to  provide 
more  realistic  estimates  of  MCS  performance. 
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ROC  curves  for  each  of  the  individual  classifiers,  and  the  method  can  be  used  to  estimate  the  ROC  curve  for  the  entire  system.  A 
consequence  of  this  result  is  that  one  can  save  time  and  money  by  effectively  evaluating  the  performance  of  an  MCS  without 
performing  experiments. _ 
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